Joint COVID-19 and influenza-like illness forecasts in the United States using internet search information

Background As the prolonged COVID-19 pandemic continues, severe seasonal Influenza (flu) may happen alongside COVID-19. This could cause a “twindemic”, in which there are additional burdens on health care resources and public safety compared to those occurring in the presence of a single infection. Amidst the raising trend of co-infections of the two diseases, forecasting both Influenza-like Illness (ILI) outbreaks and COVID-19 waves in a reliable and timely manner becomes more urgent than ever. Accurate and real-time joint prediction of the twindemic aids public health organizations and policymakers in adequate preparation and decision making. However, in the current pandemic, existing ILI and COVID-19 forecasting models face shortcomings under complex inter-disease dynamics, particularly due to the similarities in symptoms and healthcare-seeking patterns of the two diseases. Methods Inspired by the interconnection between ILI and COVID-19 activities, we combine related internet search and bi-disease time series information for the U.S. national level and state level forecasts. Our proposed ARGOX-Joint-Ensemble adopts a new ensemble framework that integrates ILI and COVID-19 disease forecasting models to pool the information between the two diseases and provide joint multi-resolution and multi-target predictions. Through a winner-takes-all ensemble fashion, our framework is able to adaptively select the most predictive COVID-19 or ILI signals. Results In the retrospective evaluation, our model steadily outperforms alternative benchmark methods, and remains competitive with other publicly available models in both point estimates and probabilistic predictions (including intervals). Conclusions The success of our approach illustrates that pooling information between the ILI and COVID-19 leads to improved forecasting models than individual models for either of the disease.

• Flu activity increased after the end of the study period in the US while COVID activity decreased. The authors should extend the study period to capture this period to see how these methods perform.
• The authors should look at performance of the specific methods during periods of rapidly changing dynamics (e.g. peaking periods, increasing periods) to see how the methods perform when forecasts are most challenging.

# Summary
In this manuscript, the authors claim that (separate) COVID-19 and seasonal influenza forecasts can be improved by incorporating both COVID-19 and influenza data into the forecasting models. To support this claim, the authors introduce several models for forecasting COVID-19 deaths, COVID-19 cases, and CDC influenza-like illness (ILI) data, and evaluate their performance at (USA) state and national levels over the period 2020-07-04 to 2021-03-05. The authors demonstrate that the COVID-19 case forecasts generated by models that incorporate both COVID-19 and influenza data are competitive with the best-performing models in the CDC COVID-19 forecasting ensemble.

# Overall impression
The simultaneous circulation of COVID-19 and seasonal influenza is a major concern for healthcare systems around the world. We should expect some correlation between COVID-19 cases and influenza-like illnesses, since these pathogens share common modes of transmission, and much of this correlation could presumably be explained by human mobility and mixing. Accordingly, the major claim in this paper is sensible and, as far as I am aware, has not been investigated in the existing literature. The results clearly support the authors' claim, and it is impressive that the proposed "ARGOX-Joint-Ensemble" model is competitive with the best-performing models in the CDC ensemble.
I appreciate that it can be difficult and time-consuming to package code and supporting materials so that it is both self-contained and readily usable by others. My aim in identifying several limitations in the provided repository (listed below) is to support the authors in allowing this study to be reproduced by others.
-The provided `README.md` includes a brief summary of the underlying data, it does not include any user instructions.
-Each `R` source file uses hard-coded directories that are specific to the author's computer (e.g., `"~/Documents/Georgia_Tech/Research at GATECH/Research with Dr. Shihao Yang/COVID-19/FLU+COVID19"`) -There are no instructions on how to obtain the necessary data files, how they should be named, and where they should be stored. Some files are downloaded by the provided scripts (to the hard-coded path shown above) but others are assumed to exist locally.
-Several of the provided source files rely on another file (`ILI_COVID_Data_Clean.R`) which is not included in the repository.
-The provided source files appear to only cover steps 1 and 2 of the method described in section 3.2 of the manuscript, and are missing the third step, which produces the "final winner-takes-all ensemble predictions".

Forecasts in the United States using Internet Search Information"
Summary of Major Changes in this Revision : • We expanded our forecasting and evaluation period by including the most recent data (now from 2020-07-04 to 2022-08-13), according to the suggestion from reviewer #2. All tables and figures in the main draft and supplementary materials are updated accordingly.
The overall performance of our national and state-level COVID-19 cases, deaths and %ILI predictions remain consistent and we are able to capture the most recent trend led by Omicron-Variant in early 2022 and summer 2022. • We extend the forecasting horizon for %ILI on both national and state level to 2 weeks, according to the suggestion from reviewer #3. Our 2-week-ahead %ILI predictions remain accurate and robust throughout the evaluation period.
• In addition to the weekly point estimates (forecasts), ARGOX-Joint-Ensemble method now also produces probabilistic forecasts and prediction intervals for all the forecasting targets (COVID-19 cases, deaths, and %ILI), on both national and state-level, according to the suggestion from reviewer #1 and #2. We calculate the 95% nominal prediction interval coverage of the 1-4 weeks ahead COVID-19 cases and deaths and 1-2 weeks ahead %ILI forecasts across 51 regions. We also calculate the weighted interval score (WIS), evaluated across 11 prediction intervals with 1 =0.02, 2 =0.05, 3 =0.1, …, 11 =0.9 (implying nominal coverages of 98%, 95%, 90%, …, 10%), following CDC Forecast Hub's submission guideline [35]. For COVID-19 cases and deaths forecasts, we compare the prediction intervals' coverages and WIS with other publicly available methods submitted to CDC Forecast Hub. For %ILI forecasts, we compare the prediction intervals' coverages and WIS with the lagged 1 vector autoregressive model (VAR). The prediction intervals' details are included in a newly added section in the Supplementary Materials. All evaluations and analysis are included in the Results section, which still shows competitiveness of our methods in short-term forecasts.
• We simplified the daily ILI data imputation, according to reviewer #1 and #2, by filling the daily ILI with weekly ILI. This change of imputation method still preserves the results and conclusion. All results are updated accordingly. We also preserve the original imputation method as a sensitivity analysis in the Supplementary Materials.
• We modified the layout of our state-by-state COVID-19 cases, deaths and %ILI forecasts evaluations in the Supplementary Material, and added additional details in the Results, and Discussion section to examine the comparing methods' performances during various periods, according to the suggestion of reviewer #2 and #3 . We now zoom in on two selected U.S. states, Georgia and North Carolina, for all three forecasting targets, and evaluate the forecasting performances in different covid-19 waves. We notice similar behaviors among the comparing methods in the other states as well.
• We added additional state-level COVID-19 cases and deaths forecasting performance evaluation during three selected rapidly changing dynamics, which correspond to COVID-19 second wave (end of 2020), COVID-19 wave led by Delta variant (summer 2021), and COVID-19 wave led by Omicron variant (early 2022), according to reviewer #2. All methods comparison and analysis are added in Supplementary Material and the Results section.
• We have added additional clarifications in the Introduction and Discussion section according to the suggestion of reviewer #1 and #2.
• We have deposited our organized code (with execution instructions) in Github ( https://github.com/stevenmsm/Joint-COVID-19-and-Influenza-Forecasts-in-the-United-S tates ), with detailed instructions to execute the code to reproduce the results shown in this study. We have also deposited the online search data used in this study in Harvard dataverse, DOI: 10.7910/DVN/PGNBAX.
• We renamed our proposed method in the revised manuscript in order to clarify and simplify their meanings. Note that we add "single-disease" before a method, indicating that the method only uses one particular disease's information, while adding "bi-disease" before a method, indicating that the method uses both COVID-19 and ILI's information.
For national-level predictions, we renamed the previous "ARGO-Joint" as "ARGO-Nat", which produces the final national-level forecasts for all targets. For state-level predictions, we renamed the previous "ARGOX-Idv" as "Bi-disease ARGOX-Local", which is the proposed method in Step 2 ( Figure 2 Response : We thank the reviewer for providing detailed and insightful comments. We acknowledge the limitations that the reviewer pointed out. We have revised the paper accordingly (all changes are written in blue), and addressed all concerns and comments raised by the reviewer via point-to-point responses below. The constructive suggestions have greatly helped us improve the clarity and quality of our draft and are highly appreciated.
We want to emphasize that our proposed model is intentionally a straightforward and principled data-driven approach, leveraging Google search data and inspired by prior influenza research, which prevents us from overfitting. The model is unified for both national and state-level forecasts for both diseases and performs reasonably against baseline models and other publicly available benchmark models. We have added a subsection about "our contribution" in the introduction section, which now reads: "The ensemble framework is systematic and comprehensive, and each sub-models within the framework is intentionally straightforward and unified to prevent overfitting. Numerical comparisons show that our method performs competitively with other publicly available single-disease forecasting methods.
This study further emphasizes the general applicability and the predictive power of online search data for various tasks in disease surveillance." As the reviewer mentioned, the similarity between the two diseases is complex [2] and could be heavily contaminated by various factors over time, including other diseases (RSV) and COVID-19 variants. This is one of the biggest challenges in this forecasting task, as we cannot foresee nor manually let the model know all the external factors and possible contamination during the prediction. In addition, we are aware that our estimation targets, JHU COVID-19 dataset (cases, deaths) and CDC's %ILI, can be unreliable. JHU COVID-19 dataset [42], the curated dataset used by CDC at their official website, retrospectively corrected past confirmed cases and deaths due to reporting error, especially during the early stage of COVID-19. On the other hand, %ILI, released by CDC every week, is the groundtruth for flu and is the percentage of outpatient visits with influenza-like illnesses. This is a proxy for the true flu incidence in the population, as it is calculated from a sample of outpatient visits with influenza-like symptoms. Response : We thank the reviewer for pointing out the lack of hospitalization usage (COVID-19 related daily new hospital admissions). Following the reviewer's suggestion, we incorporate hospitalization as one of the input features for national-level COVID-19 deaths forecasts, and slightly modify the input features order for all of our targets. In short, on the national-level, in addition to lagged ILI information, we now use lagged COVID-19 cases information to predict COVID-19 cases, and use lagged COVID-19 deaths and hospitalizations to predict COVID-19 deaths. The input features' modification follows the intuitive timeline of an infected patient's journey from confirmed COVID-19 positive to death. By incorporating hospitalization as the additional time series feature, our COVID-19 deaths predictions remain accurate and robust during rapidly changing dynamics led by COVID-19 variants. The state-level forecasting structures remain the same, as shown in Figure 2 Response : We thank the reviewer for pointing out the reporting issues in COVID-19 cases and deaths counts. Indeed, the groundtruth for both diseases can be unreliable, and is the inherent limitation of the forecasting tasks. In particular, as the reviewer has mentioned, the COVID-19 cases can be subject to substantial reporting issues with the availability of rapid antigen tests, while COVID-19 deaths might be subject to retrospective revisions. COVID-19 cases is indeed a relative proxy to indicate the infectious status of the general population. Nevertheless, the JHU published COVID-19 confirmed case counts are still used by CDC as the groundtruth [42].
Moreover, by using New York Times github published COVID-19 data (which has no retrospective revisions) as input and focusing on COVID-19 deaths predictions, we are able to produce accurate forecasts that are valuable for optimizing resource allocations, and healthcare interventions at both national and state level. In the Discussion section, we have added a paragraph regarding the limitations of the gold-standard groundtruth (please also see the Response for the first comment).
We added a sentence in the discussion section that acknowledges this limitation which reads: "In addition, it should be noted that our estimation targets, JHU Response : We thank the reviewer for pointing out the potential limitation of our current imputation strategy. Indeed %ILI is the percentage of outpatient visits with influenza-like illnesses, and is computed by dividing the total number of outpatient visits with influenza-like illnesses with the total number of outpatient visits in the region and the timestamp of interest.
Our previous imputation mechanism does not impute the numerator and denominator, and is a lack of consideration. Yet, separately imputing numerator and denominator for all states and regions is intractable. Therefore we propose a simpler imputation where the daily ILI number is the same as the weekly ILI number. Essentially, we are now assuming the daily number of patients with ILI symptoms within the week is consistent. This change of imputation method still preserves the accuracy and robustness of our COVID-19 forecasts for all the targets during the evaluation period from 2020-07-04 to 2022-08-13, shown in the revised results and discussion sections. The imputation method details and results are updated accordingly in the manucript and Supplementary Materials. The consistency in results regardless of the imputation method also serves as the evidence for the robustness of our study. Part of the previous imputation method and the corresponding result is now moved to the Supplementary Materials section "I LI Imputation Method Sensitivity Analysis'', as additional sensitivity analysis for the imputation. Response : We thank the reviewer for the suggestion of including probabilistic forecasts. Indeed, probabilistic forecasts can provide additional predictive power and interpretation, especially when the groundtruth can be unreliable. Therefore, in addition to the weekly estimates, the ARGOX-Joint-Ensemble method now also produces prediction intervals for all the forecasting targets (COVID-19 cases, deaths, and %ILI) on both national and state-level.
ARGOX-Joint-Ensemble's prediction interval (PI) is formed by taking the prediction intervals of selected ensemble methods. Tables S12-S14 show the coverage and weighted interval score

Summary : In this analysis, Ma et al. develop and evaluate point-estimate forecasting frameworks that pool information between influenza-like illness (ILI) and COVID-19
surveillance data to produce forecasts for COVID-19 reported cases and deaths and percent ILI from the U.S. ILINet system. The authors retrospectively evaluate performance of these approaches at the national and state-level from early 2020 through early 2022 using RMSE,

MAE, and correlation accuracy metrics.
Response : We thank the reviewer for the summary, and for providing insightful and constructive comments, which greatly assist the draft's quality and clarity. We have revised the paper accordingly (all changes are written in blue) and addressed all concerns and comments raised by the reviewer via point-to-point responses below.
Comments : Specific suggestions follow: • From a conceptual point, the authors should be clearer that they are not forecasting influenza but forecasting ILI. ILI is impacted by COVID, influenza, and other respiratory pathogens. Therefore, these are joint forecasts for COVID and ILI, not influenza, and the authors have not supported their statement that there's an "affinity between influenza and COVID-19's growth" but that ILI activity and COVID-19 transmission may be related to each other. In the US since the COVID-19 pandemic began, we have witnessed one season with almost no influenza transmission and one season with historically low influenza transmission. This is likely to have resulted because of behavioral interventions adopted to prevent the spread of COVID-19 and possible viral interference between COVID-19 and influenza.
Response : We thank the reviewer for pointing out the difference between ILI and influenza. Indeed, the weekly %ILI report published by CDC in different geographical resolutions is the percentage of outpatient visits with influenza-like illnesses, and is a proxy for the true flu incidences in that area. Forecasting ILI is generally (and loosely) referred to as tracking influenza activities in the existing literature [7,12,14,17,22,25].
With symptomatic similarities with COVID-19 and other respiratory pathogens, % ILI could be over-estimating true influenza incidences, especially during the initial outbreak of COVID-19 (early 2020) and the recent COVID-19 Omicron variant outbreak (early 2022 and summer 2022). This is indeed a limitation of the gold-standard and groundtruth.
However, government and health-care officials still treat %ILI as the gold standard, and it is still useful to track %ILI for early-detection of influenza outbreaks and to implement corresponding prevention and interventions. In this study, we aim to utilize the affinity between COVID-19 and %ILI to enhance previously proposed single disease forecasting frameworks for joint disease forecasting. We have revised all the wordings accordingly, and all the changes are written in color blue. We have also added a paragraph in the Discussion section regarding the limitations of the gold-standard groundtruth. Response : We thank the reviewer for the suggestion of including probabilistic forecasts.
Indeed, probabilistic forecasts can provide additional predictive power and interpretation, and are required by COVID-19 Forecast Hub. Therefore, in addition to the weekly estimates, the ARGOX-Joint-Ensemble method now also produces prediction intervals for all the forecasting targets (COVID-19 cases, deaths, and %ILI) on both national and state-level. Therefore, in addition to the weekly estimates, the ARGOX-Joint-Ensemble method now also produces prediction intervals for all the forecasting targets (COVID-19 cases, deaths, and %ILI) on both national and state-level. ARGOX-Joint-Ensemble's prediction interval (PI) is formed by taking the prediction intervals of selected ensemble methods. Tables S12-S14 show the coverage and weighted interval score (WIS) of the prediction intervals across all 51 states for 1-4 weeks ahead COVID-19 cases/deaths predictions and 1-2 weeks ahead %ILI predictions, respectively. The WIS is evaluated across 11 prediction intervals (nominal coverages of 98%, 95%, 90%, …, 10%),  rolling-window forecast, we only use the input features that are available at the time of forecast. We do not use any forward looking information. We also included a detailed data collection and prediction schedule in the Supplementary Material (section "ARGOX-Joint-Ensemble") to further clarify our "real-time" prediction generation scheme. Please also see the Response for the next comment for more details on the rolling-window forecasts' indexing.
Moreover, we have added another paragraph in the discussion section, elaborating on the limitation of the retrospective nature of our study: "Another limitation is the retrospective nature of this study.
Although we are not using any "forward looking" information that wouldn't be available at the time of prediction to reflect "real-time" Response : We thank the reviewer for pointing out the potential confusion of our forecasting activities. We want to emphasize that we are conducting rolling-window forecasts, which means that all the input features we use are those that are available at the time of forecast. We do not use any forward looking information, and restrict our approach similar to those that report to CDC Forecast Hub in "real-time". We expand this in detail in the Supplementary Materials, Section "Modification of Previously Proposed COVID-19 and Influenza Methods" Subsection "ARGOX-Joint-Ensemble". A portion of the paragraph now reads: "Here, we want to state that we are careful on the indexing and are not using any forward looking information, when we use the four sub-models (Figure 2 Step 2) to obtain COVID-19 cases/deaths estimates. Moreover, we simplified our ILI imputation approach. Since %ILI is the percentage of outpatient visits with influenza-like illnesses, and is computed by dividing the total number of outpatient visits with influenza-like illnesses with the total number of outpatient visits in the region and the timestamp of interest, our previous imputation mechanism does not impute the numerator and denominator, and is a lack of consideration. Yet, separately imputing numerator and denominator for all states and regions is intractable. Therefore we propose a simpler imputation where the daily ILI data is the weekly ILI. Essentially, we are now assuming the daily number of patients with ILI symptoms within the week is consistent. This change of imputation method still • Flu activity increased after the end of the study period in the US while COVID activity decreased. The authors should extend the study period to capture this period to see how these methods perform.
Response : We thank the reviewer for the comment regarding extending the study period.
We have expanded our evaluations period from 2020-07-04 to 2022-08-13, by including the most recent data. Therefore, all tables and figures in the main draft and supplementary materials are updated accordingly. The overall performance of our national and state-level 1-4 weeks ahead predictions remain consistent and we are able to capture the most recent trend led by Omicron-Variant in early 2022 and summer 2022.
• The authors should look at performance of the specific methods during periods of rapidly changing dynamics (e.g. peaking periods, increasing periods) to see how the methods perform when forecasts are most challenging.
Response : We thank the reviewer for pointing out the emphasis on analyzing specific periods. We have added in additional forecasting performance evaluation and analysis for both COVID-19 cases and deaths during three rapidly changing dynamics. Table S10 shows 1-4 weeks ahead of state-level COVID-19 deaths prediction performance comparisons in three selected rapidly changing dynamics: COVID-19 second wave  Table S16, S17, Figure S12, S13).  Figure S13) as well as other states." and seasonal influenza is a major concern for healthcare systems around the world. We should expect some correlation between COVID-19 cases and influenza-like illnesses, since these pathogens share common modes of transmission, and much of this correlation could presumably be explained by human mobility and mixing. Accordingly, the major claim in this paper is sensible and, as far as I am aware, has not been investigated in the existing literature.

ensemble.
Response : We thank the reviewer for providing detailed summary and insightful comments. We also thank the reviewer for the encouraging words on this important problem. We have revised the paper accordingly (all changes are written in blue), and addressed all concerns and comments raised by the reviewer via point-to-point responses below. The constructive suggestions have greatly helped us improve the clarity and quality of our draft and are highly appreciated.

Comments:
1. The "winner takes all" approach is not described in enough detail to be reproduced. week ahead targets)?
If not, how are the models evaluated?
Response : We thank the reviewer for pointing out the unclear explanation of our "winner takes all" approach. We detailly explain the approach below, and also added a new   [12,26,29,39]. Yet, the proposed ARGO-Nat (national-level) and ARGOX-Joint-Ensemble (state-level) gradually lose their predictive power towards %ILI, when forecasting horizon extends to 3 and 4 weeks, due to signal deterioration in influenza related Google search data and COVID-19 confirmed cases, and therefore we focus on 1-2 weeks ahead %ILI predictions in this study.
One hypothesis is that the COVID-19 symptoms and contagious period last longer than influenza [57], and thus the COVID-19 time series and related Google search information are more predictable for COVID-19 than those for %ILI for longer forecasting horizons. We have added the explanation above in the Discussion section. If the authors used a 15-week evaluation period to select the best model for each forecast target, did they also consider shorter evaluation periods to allow the ensemble to switch more rapidly between the competing models?
Response : We thank the reviewer for pointing out the possible result difference using different "hyperparameter" values. We apologize again for the unclarity regarding the training period of "winner-takes-all" step. Response: We thank the reviewer for pointing out the potential early-warning "feature".
Indeed, one of the eventual goals of forecasting tasks is to give early-warnings for health care intervention, and resource allocation. Indeed, with the help of bi-disease ARGOX-Local (pooled-model), ARGOX-Joint-Ensemble is robust towards detecting upcoming surges, especially for 1-2 weeks ahead short term forecast, and is less prone to overfitting and overestimation during peaking periods. However, due to Google search and ILI's signal deterioration, ARGOX-Joint-Ensemble gradually loses its predictive power towards COVID-19 when the forecast horizon extends (similar to %ILI).
Nevertheless, ARGOX-Joint-Ensemble can still uniformly outperform baseline time series benchmark and persistence model on average, and perform reasonably against other CDC published COVID-19 forecasts. Furthermore, ARGOX-Joint-Ensemble now also produces prediction intervals (Tables S12-S14), and has robust coverage for 1-2 weeks ahead forecasts, compensating its long-term forecasts' deterioration.
The long-term forecasts' deterioration is indeed a limitation, shown in both point estimates (Tables 1-5) and prediction intervals (Table S12-S14). We cannot predict the onset and the finish of the disease season, due to the data-driven nature of the models, and therefore no significant long-term insights (such as peak timing, onset of a wave, etc.). We discussed this limitation in our discussion, which reads: "Like all big-data-based models, our model has its limitations. 1. It would be useful to highlight the study period in Figure 1. The authors might also consider including similar plots for the national data?
Due to space limitations, we will omit the similar plot as Figure 1 for national data in this paper. Tables  figures. We also organized the comparing methods (in Tables 1, 3, 5) in bullet-points, according to the ensemble's hierarchy (see Figure 2).  Response : We thank the reviewer for pointing out the duplicated references. We have now removed one of them.

Replies to Comments Provided by Reviewer #2 :
Summary: The authors have addressed most of the concerns raised in the previous review, and I greatly appreciate the amount of effort that went into the revision.
Response : We thank the reviewer for providing detailed and insightful comments. We appreciate the thorough and insightful comments form the reviewer, which greatly helped us to improve the clarity and quality of the manuscript. We have revised the paper accordingly (all changes are written in blue), and addressed all concerns and comments raised by the reviewer via point-to-point responses below. Response : We thank the reviewer for the suggestion of reframing the analysis from influenza forecast to ILI forecast, and citing CDC FluView's descriptions on outpatient respiratory illness surveillance. We have now reframed the manuscript's analysis to Influenza-like Illness forecasts, which includes the following changes: • We have changed all wordings, subject to the CDC FluView's description above. For instance, we changed all "Influenza" to "Influenza-like Illness". Comment: Because of the above, the CDC-led forecasting activities switched their forecast target from ILI to flu-associated hospitalizations. Therefore, the authors need to be clear that their forecast target is not an indicator just for influenza that COVID-19 can contribute significantly to the signal. This is best illustrated by the last season in the US when ILI was much higher in Winter 2021 than in Spring 2022, even though there was more flu activity in spring than winter.